Search CORE

12 research outputs found

Sparse and Non-Negative BSS for Noisy Data

Author: Bobin Jérôme
Larue Anthony
Rapin Jérémy
Starck Jean-Luc
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/08/2013
Field of study

Non-negative blind source separation (BSS) has raised interest in various fields of research, as testified by the wide literature on the topic of non-negative matrix factorization (NMF). In this context, it is fundamental that the sources to be estimated present some diversity in order to be efficiently retrieved. Sparsity is known to enhance such contrast between the sources while producing very robust approaches, especially to noise. In this paper we introduce a new algorithm in order to tackle the blind separation of non-negative sparse sources from noisy measurements. We first show that sparsity and non-negativity constraints have to be carefully applied on the sought-after solution. In fact, improperly constrained solutions are unlikely to be stable and are therefore sub-optimal. The proposed algorithm, named nGMCA (non-negative Generalized Morphological Component Analysis), makes use of proximal calculus techniques to provide properly constrained solutions. The performance of nGMCA compared to other state-of-the-art algorithms is demonstrated by numerical experiments encompassing a wide variety of settings, with negligible parameter tuning. In particular, nGMCA is shown to provide robustness to noise and performs well on synthetic mixtures of real NMR spectra.Comment: 13 pages, 18 figures, to be published in IEEE Transactions on Signal Processin

arXiv.org e-Print Archive

HAL-CEA

Sparse BSS in the presence of outliers

Author: Bobin Jérôme
Chenot Cécile
Rapin Jérémy
Publication venue: HAL CCSD
Publication date: 15/01/2015
Field of study

submitted to SPARS15—While real-world data are often grossly corrupted, most techniques of blind source separation (BSS) give erroneous results in the presence of outliers. We propose a robust algorithm that jointly estimates the sparse sources and outliers without requiring any prior knowledge on the outliers. More precisely, it uses an alternative weighted scheme to weaken the influence of the estimated outliers. A preliminary experiment is presented and demonstrates the advantage of the proposed algorithm in comparison with state-of-the-art BSS methods. I. PROBLEM FORMULATION Suppose we are given m noisy observations {Xi} i=1..m of unknown linear mixtures of n ≤ m sparse sources {Sj} j=1..n with t > m samples. It is generally assumed that these data are corrupted by a Gaussian noise, accounting for instrumental or model imperfections. However in many applications, some entries are additionally corrupted by outliers, leading to the following model: X = AS + O + N, with X the observations, A the mixing matrix, S the sources, O the outliers, and N the Gaussian noise. In the presence of outliers, the key difficulty lies in separating the components O and AS. To this end, assuming that the term AS has low-rank, some strategies [4] suggest to pre-process the data to estimate and remove the outliers with RPCA [3]. However, besides the fact that low-rankness is generally restrictive for most BSS problems, the source separation is severely hampered if the outliers are not well estimated. Therefore, we introduce a method that estimates the sources in the presence of the outliers without pre-processing. For the best of our knowledge, it has only been studied in [5] by using the β-divergence. Unlike [5], we propose to estimate jointly the outliers and the sources by exploiting their sparsity. II. ALGORITH

HAL-CEA

Application of Non-negative Matrix Factorization to LC/MS data

Author: Bobin Jérôme
Junot Chistophe
Larue Anthony
Ouethrani Minale
Rapin Jérémy
Souloumiac Antoine
Starck Jean-Luc
Publication venue: 'Elsevier BV'
Publication date: 24/03/2015
Field of study

International audienceLiquid Chromatography-Mass Spectrometry (LC/MS) provides large datasets from which one needs to extract the relevant information. Since these data are made of non-negative mixtures of non-negative mass spectra, non-negative matrix factorization (NMF) is well suited for its processing, but it has barely been used in LC/MS. Also, these data are very difficult to deal with since they are usually contaminated with non-Gaussian noise and the intensities vary on several orders of magnitude. In this article, we show the feasibility of the NMF approach on these data. We also propose an adaptation of one of the algorithms aiming at specifically dealing with LC/MS data. We finally perform experiments and compare standard NMF algorithms on both simulated data and an annotated LC/MS dataset. This lets us evaluate the influence of the noise model and the data model on the recovery of the sources

HAL-CEA

Code Llama: Open Foundation Models for Code

Author: Adi Yossi
Azhar Faisal
Bhatt Manish
Bitton Joanna
Copet Jade
Défossez Alexandre
Evtimov Ivan
Ferrer Cristian Canton
Gat Itai
Gehring Jonas
Gloeckle Fabian
Grattafiori Aaron
Kozhevnikov Artyom
Liu Jingyu
Martin Louis
Rapin Jérémy
Remez Tal
Rozière Baptiste
Scialom Thomas
Sootla Sten
Synnaeve Gabriel
Tan Xiaoqing Ellen
Touvron Hugo
Usunier Nicolas
Xiong Wenhan
Publication venue
Publication date: 25/08/2023
Field of study

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama reaches state-of-the-art performance among open models on several code benchmarks, with scores of up to 53% and 55% on HumanEval and MBPP, respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, and all our models outperform every other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows for both research and commercial use

arXiv.org e-Print Archive

Sparse decompositions for advanced data analysis of hyperspectral data in biological applications

Author: Rapin Jérémy
Publication venue
Publication date: 19/12/2014
Field of study

La séparation de sources en aveugle (SSA) vise à rechercher des signaux sources inconnus et mélangés de manière inconnue au sein de plusieurs observations. Cette approche très générique et non-supervisée ne fournit cependant pas nécessairement des résultats exploitables. Il est alors nécessaire d’ajouter des contraintes, notamment physiques, afin de privilégier la recherche de sources ayant une structure particulière. La factorisation en matrices positives (non-negative matrix factorization, NMF) qui fait plus précisément l’objet de cette thèse recherche ainsi des sources positives observées au travers de mélanges linéaires positifs.L’ajout de davantage d’information reste cependant souvent nécessaire afin de pouvoir séparer les sources. Nous nous intéressons ainsi au concept de parcimonie qui permet d’améliorer le contraste entre celles-ci tout en produisant des approches très robustes, en particulier au bruit. Nous montrons qu’afin d’obtenir des solutions stables, les contraintes de positivité et la régularisation parcimonieuse doivent être appliqués de manière adéquate. Aussi, l’utilisation de la parcimonie dans un espace transformé potentiellement redondant, permettant de capturer la structure de la plu- part des signaux naturels, se révèle difficile à appliquer au côté de la contrainte de positivité dans l’espace direct. Nous proposons ainsi un nouvel algorithme de NMF parcimonieuse, appelé nGMCA (non-negative Generalized Morphological Component Analysis), qui surmonte ces difficultés via l’utilisation de techniques de calcul proximal. Des expérimentations sur des données simulées montrent que cet algorithme est robuste à une contamination par du bruit additif Gaussien, à l’aide d’une gestion automatique du paramètre de parcimonie. Des comparaisons avec des algorithmes de l’état-de-l’art en NMF sur des données réalistes montrent l’efficacité ainsi que la robustesse de l’approche proposée.Finalement, nous appliquerons nGMCA sur des données de chromatographie en phase liquide - spectrométrie de masse (liquid chromatography - mass spectrometry, LC-MS). L’observation de ces données montre qu’elles sont contaminées par du bruit multiplicatif, lequel détériore grandement les résultats des algorithmes de NMF. Une extension de nGMCA conçue pour prendre en compte ce type de bruit à l’aide d’un a priori non-stationnaire permet alors d’obtenir d’excellents résultats sur des données réelles annotées.Blind source separation aims at extracting unknown source signals from observations where these sources are mixed together by an unknown process. However, this very generic and non-supervised approach does not always provide exploitable results. Therefore, it is often necessary to add more constraints, generally arising from physical considerations, in order to favor the recovery of sources with a particular sought-after structure. Non-negative matrix factorization (NMF), which is the main focus of this thesis, aims at searching for non-negative sources which are observed through non-negative linear mixtures.In some cases, further information still remains necessary in order to correctly separate the sources. Here, we focus on the sparsity concept, which helps improving the contrast between the sources, while providing very robust approaches, even when the data are contaminated by noise. We show that in order to obtain stable solutions, the non-negativity and sparse constraints must be applied adequately. In addition, using sparsity in a potentially redundant transformed domain could allow to capture the structure of most of natural image, but this kind of regularization proves difficult to apply together with the non-negativity constraint in the direct domain. We therefore propose a sparse NMF algorithm, named nGMCA (non-negative Generalized Morphological Component Analysis), which overcomes these difficulties by making use of proximal calculus techniques. Experiments on simulated data show that this algorithm is robust to additive Gaussian noise contamination, with an automatic control of the sparsity parameter. This novel algorithm also proves to be more efficient and robust than other state-of-the-art NMF algorithms on realistic data.Finally, we apply nGMCA on liquid chromatography - mass spectrometry data. Observation of these data show that they are contaminated by multiplicative noise, which greatly deteriorates the results of the NMF algorithms. An extension of nGMCA was designed to take into account this type of noise, thanks to the use of a non-stationary prior. This extension is then able to obtain excellent results on annotated real data

Theses.fr

Décompositions parcimonieuses pour l'analyse avancée de données en spectrométrie pour la Santé

Author: Rapin Jérémy
Publication venue: HAL CCSD
Publication date: 19/12/2014
Field of study

Blind source separation aims at extracting unknown source signals from observations where these sources are mixed together by an unknown process. However, this very generic and non-supervised approach does not always provide exploitable results. Therefore, it is often necessary to add more constraints, generally arising from physical considerations, in order to favor the recovery of sources with a particular sought-after structure. Non-negative matrix factorization (NMF), which is the main focus of this thesis, aims at searching for non-negative sources which are observed through non-negative linear mixtures.In some cases, further information still remains necessary in order to correctly separate the sources. Here, we focus on the sparsity concept, which helps improving the contrast between the sources, while providing very robust approaches, even when the data are contaminated by noise. We show that in order to obtain stable solutions, the non-negativity and sparse constraints must be applied adequately. In addition, using sparsity in a potentially redundant transformed domain could allow to capture the structure of most of natural image, but this kind of regularization proves difficult to apply together with the non-negativity constraint in the direct domain. We therefore propose a sparse NMF algorithm, named nGMCA (non-negative Generalized Morphological Component Analysis), which overcomes these difficulties by making use of proximal calculus techniques. Experiments on simulated data show that this algorithm is robust to additive Gaussian noise contamination, with an automatic control of the sparsity parameter. This novel algorithm also proves to be more efficient and robust than other state-of-the-art NMF algorithms on realistic data.Finally, we apply nGMCA on liquid chromatography - mass spectrometry data. Observation of these data show that they are contaminated by multiplicative noise, which greatly deteriorates the results of the NMF algorithms. An extension of nGMCA was designed to take into account this type of noise, thanks to the use of a non-stationary prior. This extension is then able to obtain excellent results on annotated real data.La séparation de sources en aveugle (SSA) vise à rechercher des signaux sources inconnus et mélangés de manière inconnue au sein de plusieurs observations. Cette approche très générique et non-supervisée ne fournit cependant pas nécessairement des résultats exploitables. Il est alors nécessaire d’ajouter des contraintes, notamment physiques, afin de privilégier la recherche de sources ayant une structure particulière. La factorisation en matrices positives (non-negative matrix factorization, NMF) qui fait plus précisément l’objet de cette thèse recherche ainsi des sources positives observées au travers de mélanges linéaires positifs.L’ajout de davantage d’information reste cependant souvent nécessaire afin de pouvoir séparer les sources. Nous nous intéressons ainsi au concept de parcimonie qui permet d’améliorer le contraste entre celles-ci tout en produisant des approches très robustes, en particulier au bruit. Nous montrons qu’afin d’obtenir des solutions stables, les contraintes de positivité et la régularisation parcimonieuse doivent être appliqués de manière adéquate. Aussi, l’utilisation de la parcimonie dans un espace transformé potentiellement redondant, permettant de capturer la structure de la plu- part des signaux naturels, se révèle difficile à appliquer au côté de la contrainte de positivité dans l’espace direct. Nous proposons ainsi un nouvel algorithme de NMF parcimonieuse, appelé nGMCA (non-negative Generalized Morphological Component Analysis), qui surmonte ces difficultés via l’utilisation de techniques de calcul proximal. Des expérimentations sur des données simulées montrent que cet algorithme est robuste à une contamination par du bruit additif Gaussien, à l’aide d’une gestion automatique du paramètre de parcimonie. Des comparaisons avec des algorithmes de l’état-de-l’art en NMF sur des données réalistes montrent l’efficacité ainsi que la robustesse de l’approche proposée.Finalement, nous appliquerons nGMCA sur des données de chromatographie en phase liquide - spectrométrie de masse (liquid chromatography - mass spectrometry, LC-MS). L’observation de ces données montre qu’elles sont contaminées par du bruit multiplicatif, lequel détériore grandement les résultats des algorithmes de NMF. Une extension de nGMCA conçue pour prendre en compte ce type de bruit à l’aide d’un a priori non-stationnaire permet alors d’obtenir d’excellents résultats sur des données réelles annotées

Thèses en Ligne

HAL-CEA

NMF with Sparse Regularizations in Transformed Domains

Author: Bobin Jérôme
Larue Anthony
Rapin Jérémy
Starck Jean-Luc
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2014
Field of study

International audienceNonnegative blind source separation, which is also referred to as nonnegative matrix factorization (NMF), is a very active field in domains as different as astrophysics, audio processing, and biomedical signal processing. In this context, the efficient retrieval of the sources requires the use of signal priors such as sparsity. Although NMF has been well studied with sparse constraints in the direct domain, only very few algorithms can encompass nonnegativity together with sparsity in a transformed domain since simultaneously dealing with two priors in two different domains is challenging. In this paper, we show how a sparse NMF algorithm called nonnegative generalized morphological component analysis (nGMCA) can be extended to impose nonnegativity in the direct domain along with sparsity in a transformed domain, with both analysis and synthesis formulations. To the best of our knowledge, this work presents the first comparison of analysis and synthesis priors---as well as their reweighted versions---in the context of blind source separation. Comparisons with state-of-the-art NMF algorithms on realistic data show the efficiency as well as the robustness of the proposed algorithms

arXiv.org e-Print Archive

HAL-CEA

The photonics and ARCoating testbeds in Nevergrad

Author: Bennet Pauline
Centeno Emmanuel
Moreau Antoine
Rapin Jérémy
Teytaud Olivier
Publication venue: HAL CCSD
Publication date: 19/05/2020
Field of study

We detail the testbeds which can be found in the photonics and ARCoating test cases in Nevergrad. These testbeds, beyond their practical importance, are especially modular-and like all problems in photonics, possess a very high number of local minima. They constitute simple but crucial design challenges: designing a multilayered structure which can reflect light at a given frequency, reflect light on a whole range of frequencies or cancel the reflectivity ; designing a complex structure able to diffract the blue part of the spectrum and to cancel the reflectivity at any other wavelength

HAL Clermont Université